Import data from a csv


In [97]:
import pandas as pd
df = pd.read_csv('Zip_Zri_MultiFamilyResidenceRental.csv')
df.iloc[[0,1,3]]


Out[97]:
RegionID RegionName City State Metro CountyName SizeRank 2010-09 2010-10 2010-11 ... 2019-04 2019-05 2019-06 2019-07 2019-08 2019-09 2019-10 2019-11 2019-12 2020-01
0 61639 10025 New York NY New York-Newark-Jersey City New York County 1 2930.0 2952.0 2926.0 ... 3484.0 3523.0 3573.0 3622.0 3664.0 3698.0 3704.0 3692.0 3715.0 3676.0
1 84654 60657 Chicago IL Chicago-Naperville-Elgin Cook County 2 1447.0 1465.0 1469.0 ... 1731.0 1749.0 1768.0 1787.0 1801.0 1806.0 1789.0 1761.0 1747.0 1731.0
3 91982 77494 Katy TX Houston-The Woodlands-Sugar Land Harris County 4 NaN NaN NaN ... 1226.0 1250.0 1264.0 1251.0 1246.0 1252.0 1280.0 1310.0 1294.0 1286.0

3 rows × 120 columns

Fill NaN values with the a string of None


In [98]:
df_none=df.fillna(value='None')
df_none.iloc[[0,1,3]]


Out[98]:
RegionID RegionName City State Metro CountyName SizeRank 2010-09 2010-10 2010-11 ... 2019-04 2019-05 2019-06 2019-07 2019-08 2019-09 2019-10 2019-11 2019-12 2020-01
0 61639 10025 New York NY New York-Newark-Jersey City New York County 1 2930 2952 2926 ... 3484 3523 3573 3622 3664 3698 3704 3692 3715 3676
1 84654 60657 Chicago IL Chicago-Naperville-Elgin Cook County 2 1447 1465 1469 ... 1731 1749 1768 1787 1801 1806 1789 1761 1747 1731
3 91982 77494 Katy TX Houston-The Woodlands-Sugar Land Harris County 4 None None None ... 1226 1250 1264 1251 1246 1252 1280 1310 1294 1286

3 rows × 120 columns

Fill NaN values with specific numeric values


In [99]:
df_zero=df.fillna(0)
df_zero.iloc[[0,1,3]]


Out[99]:
RegionID RegionName City State Metro CountyName SizeRank 2010-09 2010-10 2010-11 ... 2019-04 2019-05 2019-06 2019-07 2019-08 2019-09 2019-10 2019-11 2019-12 2020-01
0 61639 10025 New York NY New York-Newark-Jersey City New York County 1 2930.0 2952.0 2926.0 ... 3484.0 3523.0 3573.0 3622.0 3664.0 3698.0 3704.0 3692.0 3715.0 3676.0
1 84654 60657 Chicago IL Chicago-Naperville-Elgin Cook County 2 1447.0 1465.0 1469.0 ... 1731.0 1749.0 1768.0 1787.0 1801.0 1806.0 1789.0 1761.0 1747.0 1731.0
3 91982 77494 Katy TX Houston-The Woodlands-Sugar Land Harris County 4 0.0 0.0 0.0 ... 1226.0 1250.0 1264.0 1251.0 1246.0 1252.0 1280.0 1310.0 1294.0 1286.0

3 rows × 120 columns

Fill NaN values using forward values


In [100]:
df_forward_fill=df.fillna(method='ffill')
df_forward_fill.iloc[[0,1,3]]


Out[100]:
RegionID RegionName City State Metro CountyName SizeRank 2010-09 2010-10 2010-11 ... 2019-04 2019-05 2019-06 2019-07 2019-08 2019-09 2019-10 2019-11 2019-12 2020-01
0 61639 10025 New York NY New York-Newark-Jersey City New York County 1 2930.0 2952.0 2926.0 ... 3484.0 3523.0 3573.0 3622.0 3664.0 3698.0 3704.0 3692.0 3715.0 3676.0
1 84654 60657 Chicago IL Chicago-Naperville-Elgin Cook County 2 1447.0 1465.0 1469.0 ... 1731.0 1749.0 1768.0 1787.0 1801.0 1806.0 1789.0 1761.0 1747.0 1731.0
3 91982 77494 Katy TX Houston-The Woodlands-Sugar Land Harris County 4 2797.0 2811.0 2813.0 ... 1226.0 1250.0 1264.0 1251.0 1246.0 1252.0 1280.0 1310.0 1294.0 1286.0

3 rows × 120 columns

Let us inspect that a forward fill a little further


In [101]:
df.iloc[[3]][['2010-09','2010-10','2014-04','2014-05','2014-05','2014-06','2016-01','2019-12','2020-01']]


Out[101]:
2010-09 2010-10 2014-04 2014-05 2014-05 2014-06 2016-01 2019-12 2020-01
3 NaN NaN 1093.0 1113.0 1113.0 1118.0 1295.0 1294.0 1286.0

In [102]:
df.iloc[[2,3,4]][['2010-09','2010-10','2014-04','2014-05','2014-05','2014-06','2016-01','2019-12','2020-01']]


Out[102]:
2010-09 2010-10 2014-04 2014-05 2014-05 2014-06 2016-01 2019-12 2020-01
2 2797.0 2811.0 3131.0 3118.0 3118.0 3122.0 3582.0 3561.0 NaN
3 NaN NaN 1093.0 1113.0 1113.0 1118.0 1295.0 1294.0 1286.0
4 1437.0 1469.0 1774.0 1780.0 1780.0 1778.0 1834.0 NaN 1916.0

In [103]:
df_forward_fill.iloc[[3]][['2010-09','2010-10','2014-04','2014-05','2014-05','2014-06','2016-01']]


Out[103]:
2010-09 2010-10 2014-04 2014-05 2014-05 2014-06 2016-01
3 2797.0 2811.0 1093.0 1113.0 1113.0 1118.0 1295.0

In [104]:
df_forward_fill.iloc[[2,3,4]][['2010-09','2010-10','2014-04','2014-05','2014-05','2014-06','2016-01']]


Out[104]:
2010-09 2010-10 2014-04 2014-05 2014-05 2014-06 2016-01
2 2797.0 2811.0 3131.0 3118.0 3118.0 3122.0 3582.0
3 2797.0 2811.0 1093.0 1113.0 1113.0 1118.0 1295.0
4 1437.0 1469.0 1774.0 1780.0 1780.0 1778.0 1834.0

Using ffill for forward fill without passing the axis will forward fill by using the previous rows value for that column. Using the axis parameter we can get the next column value instead of the row value like so:


In [105]:
df_forward_fill_axis_one=df.fillna(method='ffill',axis=1)
df_forward_fill_axis_one.iloc[[2,3,4]][['SizeRank','2010-09','2010-10','2014-04','2014-05','2014-05','2014-06','2016-01']]


Out[105]:
SizeRank 2010-09 2010-10 2014-04 2014-05 2014-05 2014-06 2016-01
2 3 2797 2811 3131 3118 3118 3122 3582
3 4 4 4 1093 1113 1113 1118 1295
4 5 1437 1469 1774 1780 1780 1778 1834

Before the forward fill


In [106]:
df.iloc[[2,3,4]][['SizeRank','2010-09','2010-10','2014-04','2014-05','2014-05','2014-06','2016-01']]


Out[106]:
SizeRank 2010-09 2010-10 2014-04 2014-05 2014-05 2014-06 2016-01
2 3 2797.0 2811.0 3131.0 3118.0 3118.0 3122.0 3582.0
3 4 NaN NaN 1093.0 1113.0 1113.0 1118.0 1295.0
4 5 1437.0 1469.0 1774.0 1780.0 1780.0 1778.0 1834.0

Now for this data set it is probably most appropriate to back fill using the future column values


In [107]:
df_back_fill_axis_one=df.fillna(method='bfill',axis=1)
df_back_fill_axis_one.iloc[[2,3,4]][['SizeRank','2010-09','2010-10','2010-11','2011-02','2014-03','2014-04']]


Out[107]:
SizeRank 2010-09 2010-10 2010-11 2011-02 2014-03 2014-04
2 3 2797 2811 2813 2675 3136 3131
3 4 1081 1081 1081 1081 1081 1093
4 5 1437 1469 1490 1521 1754 1774